Differential Training of 1 Rollout Policies

نویسنده

  • Dimitri P. Bertsekas
چکیده

We consider the approximate solution of stochastic optimal control problems using a neurodynamic programming/reinforcement learning methodology. We focus on the computation of a rollout policy, which is obtained by a single policy iteration starting from some known base policy and using some form of exact or approximate policy improvement. We indicate that, in a stochastic environment, the popular methods of computing rollout policies are particularly sensitive to simulation and approximation error, and we present more robust alternatives, which aim to estimate relative rather than absolute Q-factor and cost-to-go values. In particular, we propose a method, called differential training , that can be used to obtain an approximation to cost-to-go differences rather than cost-to-go values by using standard methods such as TD(λ) and λ-policy iteration. This method is suitable for recursively generating rollout policies in the context of simulation-based policy iteration methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rollout Policies for Dynamic Solutions to the Multivehicle Routing Problem with Stochastic Demand and Duration Limits

We develop a family of rollout policies based on fixed routes to obtain dynamic solutions to the vehicle routing problem with stochastic demand and duration limits (VRPSDL). In addition to a traditional one-step rollout policy, we leverage the notions of the preand post-decision state to distinguish two additional rollout variants. We tailor our rollout policies by developing a dynamic decompos...

متن کامل

Average-Case Performance of Rollout Algorithms for Knapsack Problems

Rollout algorithms have demonstrated excellent performance on a variety of dynamic and discrete optimization problems. Interpreted as an approximate dynamic programming algorithm, a rollout algorithm estimates the value-to-go at each decision stage by simulating future events while following a heuristic policy, referred to as the base policy. While in many cases rollout algorithms are guarantee...

متن کامل

Parallel Rollout for Online Solution of Partially Observable Markov Decision Processes

We propose a novel approach, called parallel rollout, to solving (partially observable) Markov decision processes. Our approach generalizes the rollout algorithm of Bertsekas and Castanon (1999) by rolling out a set of multiple heuristic policies rather than a single policy. In particular, the parallel rollout approach aims at the class of problems where we have multiple heuristic policies avai...

متن کامل

Restocking-Based Rollout Policies for the Vehicle Routing Problem with Stochastic Demand and Duration Limits

We develop restocking-based rollout policies to make real-time, dynamic routing decisions for the vehicle routing problem with stochastic demand and duration limits. Leveraging dominance results, we develop a computationally tractable method to estimate the value of an optimal restocking policy along a fixed route. Embedding our procedure in rollout algorithms, we show restocking-based rollout ...

متن کامل

Solution methodologies for vehicle routing problems with stochastic demand

We present solution methodologies for vehicle routing problems (VRPs) with stochastic demand, with a specific focus on the vehicle routing problem with stochastic demand (VRPSD) and the vehicle routing problem with stochastic demand and duration limits (VRPSDL). The VRPSD and the VRPSDL are fundamental problems underlying many operational challenges in the fields of logistics and supply chain m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997